Claims 



What is claimed is: 



1 1 . A method for document analysis and retrieval, comprising the following steps performed in 

2 the order recited: 

3 transmitting, by a remote host in a first computing system to a web service host in a 

4 second computing system, a first portion of a document; and 

5 sequentially transmitting, by the remote host to the web service host, at least one 

6 additional portion of the document, wherein the first portion and the at least one additional 

7 portion collectively comprise the entire document, wherein the entire document is adapted to be 

8 reconstructed and subsequently processed via processing said entire document by the web service 

9 host, said processing comprising at least one of: 

10 extracting text from said entire document to configure said text in a text format, if 

1 1 said entire document received by said web service host comprises said text in a non-text 

12 format; determine 

1 3 generating document keys associated with said text from analysis of said text in 

14 said text format, if said entire document received by said web service host comprises said 

1 5 text in said text format, or if said web service host has previously performed said 

16 extracting such that said text in said text format is available to said web service host; and 

1 7 determining, from given categories of a document taxonomy, a set of closet 
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1 8 categories to the document based on a comparison between the document keys and 

19 category keys of the given categories, if said entire document received by said web 

20 service host comprises said document keys, or if said web service host has previously 

21 performed said generating such that said document keys are available to said web service 

22 host. 

1 2. The method of claim 1 , further comprising prior to the sending step identifying said web 

2 services host, said identifying comprising: 

3 executing a Universal Description, Discovery, and Integration (UDDI) search to identify 

4 one or more web services hosts who can receive said document in chunks and who can perform 

5 said at least one of said extracting, generating, and stemming; and 

6 selecting said web services host from said one or more web services hosts. 

1 3. The method of claim 1, wherein said transmitting and sequentially transmitting comprise 

2 respectively transmitting and sequentially transmitting the first portion and the at least one 

3 additional portion via Internet transmission to said web service host. 

1 4. The method of claim 1, wherein said generating comprises: 

2 generating tokens of said text such that stop words do not appear in said tokens; and 

3 stemming said tokens to generate said document keys from said tokens. 
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1 5. The method of claim 1 , wherein said processing comprises said extracting, said generating, 

2 and said determining. 

1 6. The method of claim 1, wherein said processing consists of two of said extracting, said 

2 generating, and said determining. 

1 7. The method of claim 1, wherein said processing comprises said extracting but not said 

2 generating and not said determining. 

1 8. The method of claim 1, wherein said processing comprises said generating but not said 

2 extracting and not said determining. 

1 9. The method of claim 1, wherein said processing comprises said determining but not said 

2 extracting and not said generating. 
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1 10. A system for document analysis and retrieval, comprising a first computing system that 

2 includes a remote host, wherein the remote host is remote relative to a web service host in a 

3 second computing system, and wherein the remote host is adapted to: 

4 transmit a first portion of a document to the web service host; and 

5 sequentially transmit at least one additional portion of the document to the web service 

6 host, wherein the first portion and the at least one additional portion collectively comprise the 

7 entire document, wherein the entire document is adapted to be reconstructed and subsequently 

8 processed via processing said entire document by the web service host, said processing 

9 comprising at least one of: 

10 extracting text from said entire document to configure said text in a text format, if 

1 1 said entire document received by said web service host comprises said text in a non-text 

12 format; determine 

1 3 generating document keys associated with said text from analysis of said text in 

14 said text format, if said entire document received by said web service host comprises said 

1 5 text in said text format, or if said web service host has previously performed said 

1 6 extracting such that said text in said text format is available to said web service host; and 

1 7 determining, from given categories of a document taxonomy, a set of closet 

1 8 categories to the document based on a comparison between the document keys and 

19 category keys of the given categories, if said entire document received by said web 

20 service host comprises said document keys, or if said web service host has previously 

21 performed said generating such that said document keys are available to said web service 
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host. 



1 11. The system of claim 10, wherein the remote host is adapted to identify said web services host 

2 by: 

3 executing a Universal Description, Discovery, and Integration (UDDI) search to identify 

4 one or more web services hosts who can receive said document in chunks and who can perform 

5 said at least one of said extracting, generating, and determining; and 

6 selecting said web services host from said one or more web services hosts. 

1 12. The system of claim 10, wherein to send transmit and to sequentially transmit comprises to 

2 respectively transmit and sequentially transmit the first portion and the at least one additional 

3 portion via Internet transmission to said web service host. 

1 13. The system of claim 10, wherein said generating comprises: 

2 generating tokens of said text such that stop words do not appear in said tokens; and 

3 stemming said tokens to generate said document keys from said tokens. 

1 14. The system of claim 10, wherein said processing comprises said extracting, said generating, 

2 and said determining. 



1 



15. The system of claim 10, wherein said processing consists of two of said extracting, said 
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generating, and said determining. 



1 16. The system of claim 10, wherein said processing comprises said extracting but not said 

2 generating and not said determining. 

1 17. The system of claim 10, wherein said processing comprises said generating but not said 

2 extracting and not said determining. 

1 18. The system of claim 10, wherein said processing comprises said determining but not said 

2 extracting and not said generating. 
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1 19. A method for document analysis and retrieval, comprising the following steps performed in 

2 the order recited: 

3 receiving, by a web service host in a second computing system from a remote host in a 

4 first computing system, a first portion of a document; 

5 sequentially receiving, by the web service host from the remote host, at least one 

6 additional portion of the document, wherein the first portion and the at least one additional 

7 portion collectively comprise the entire document; 

8 reconstructing the entire document from the first portion and the at least one additional 

9 portion; and 

10 processing the entire document by the web service host, wherein said processing 

1 1 comprises at least one of: 

12 extracting text from said entire document to configure said text in a text format, if 

1 3 said entire document received by said web service host comprises said text in a non-text 

14 format; 

1 5 generating document keys associated with said text from analysis of said text in 

16 said text format, if said entire document received by said web service host comprises said 

17 text in said text format, or if said web service host has previously performed said 

18 extracting such that said text in said text format is available to said web service host; and 

19 determining, from given categories of a document taxonomy, a set of closet 

20 categories to the document, if said entire document received by said web service host 

21 comprises said document keys, or if said web service host has previously performed said 
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22 generating such that said document keys are available to said web service host. 

1 20. The method of claim 19, wherein the web services host is listed in a Universal Description, 

2 Discovery, and Integration (UDDI) registry as being able to receive said document in chunks and 

3 being able to perform said at least one of said extracting, generating, and determining. 

1 21. The method of claim 19, wherein said receiving and sequentially receiving steps comprise 

2 receiving the first portion and the at least one additional portion via Internet transmission from 

3 said remote host. 

1 22. The method of claim 19, wherein said generating comprises: 

2 generating tokens of said text such that stop words do not appear in said tokens; and 

3 stemming said tokens to generate said document keys from said tokens. 

1 23. The method of claim 19, wherein said processing comprises said extracting, said generating, 

2 and said determining. 

1 24. The method of claim 19, wherein said processing consists of two of said extracting, said 

2 generating, and said determining. 

1 25. The method of claim 19, wherein said processing comprises said extracting but not said 
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2 generating and not said determining. 



1 26. The method of claim 19, wherein said processing comprises said generating but not said 

2 extracting and not said determining. 

1 27. The method of claim 19, wherein said processing comprises said determining but not said 

2 extracting and not said generating. 

1 28. The method of claim 19, wherein said determining comprises: 

2 comparing the category keys of each category with said document keys to make a 

3 determination of a distance between the document and each category as a measure of how close 

4 the document is to each category; and 

5 determining said set of closest categories based on said determination. 

1 29. The method of claim 19, wherein said processing comprises said determining, and wherein 

2 the method further comprises: 

3 creating a search string, said search string comprising a logical function of a subset of 

4 said document keys; 

5 submitting said search string to a search engine; 

6 receiving links to related documents from said search engine, said links being based on 

7 said search string; and 
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returning said links to said remote host. 



LOT920030003US1 



30. A system for document analysis and retrieval, comprising a second computing system that 
includes a web service host, wherein the web service host is remote relative to a remote host in a 
first computing system, and wherein the web service host is adapted to: 
receive a first portion of a document from the remote host; 

sequentially receive at least one additional portion of the document from the remote host, 
wherein the first portion and the at least one additional portion collectively comprise the entire 
document; 

reconstruct the entire document from the first portion and the at least one additional 
portion; and 

implement processing the entire document, said processing comprising at least one of: 

extracting text from said entire document to configure said text in a text format, if 
said entire document received by said web service host comprises said text in a non-text 
format; 

generating document keys associated with said text from analysis of said text in 
said text format, if said entire document received by said web service host comprises said 
text in said text format, or if said web service host has previously performed said 
extracting such that said text in said text format is available to said web service host; and 

determining, from given categories of a document taxonomy, a set of closet 
categories to the document, if said entire document received by said web service host 
comprises said document keys, or if said web service host has previously performed said 
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generating such that said document keys are available to said web service host. 



1 31. The system of claim 30, wherein the web services host is listed in a Universal Description, 

2 Discovery, and Integration (UDDI) registry as being able to receive said document in chunks and 

3 being able to perform said at least one of said extracting, generating, and determining. 

1 32. The system of claim 30, wherein to receive and sequentially receive comprise to receive the 

2 first portion and the at least one additional portion via Internet transmission from said remote 

3 host. 

1 33. The system of claim 30, wherein said generating comprises: 

2 generating tokens of said text such that stop words do not appear in said tokens; and 

3 stemming said tokens to generate said document keys from said tokens. 

1 34. The system of claim 30, wherein said processing comprises said extracting, said generating, 

2 and said determining. 

1 35. The system of claim 30, wherein said processing consists of two of said extracting, said 

2 generating, and said determining. 
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1 36. The system of claim 30, wherein said processing comprises said extracting but not said 

2 generating and not said determining, 

1 37. The system of claim 30, wherein said processing comprises said generating but not said 

2 extracting and not said determining. 

1 38. The system of claim 30, wherein said processing comprises said determining but not said 

2 extracting and not said generating. 

1 39. The system of claim 30, wherein said determining comprises: 

2 comparing the category keys of each category with said document keys to make a 

3 determination of a distance between the document and each category as a measure of how close 

4 the document is to each category; and 

5 determining said set of closest categories based on said determination. 

1 40. The system of claim 30, wherein said processing comprises said determining, and wherein 

2 the method further comprises: 

3 creating a search string, said search string comprising a logical function of a subset of 

4 said document keys; 

5 submitting said search string to a search engine; 
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6 receiving links to related documents from said search engine, said links being based on 

7 said search string; and 

8 returning said links to said remote host. 
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